Search CORE

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition

Author: Christopher S. Funk
K. Bretonnel Cohen
Karin M. Verspoor
Lawrence E. Hunter
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Protein annotation as term categorization in the gene ontology using word proximity networks

Author: Cohn Judith
Joslyn Cliff
Mniszewski Sue
Rechtsteiner Andreas
Rocha Luis M
Simas Tiago
Verspoor Karin
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

We addressed BioCreAtIvE Task 2, the problem of annotation of a protein with a node in the Gene Ontology (GO). We approached the task as a problem of categorizing terms derived from the document neighborhood of the given protein in the given document into nodes in the GO based on the lexical overlaps with terms on GO nodes and terms identified as related to those nodes. The system incorporates NLP components such as a morphological normalizer, a named entity recognizer, a statistical term frequency analyzer, and an unsupervised method for expanding words associated with GO ids based on a probability measure that captures word proximity (Rocha, 2002). The categorization methodology uses our novel Gene Ontology Categorizer (GOC) methodology (Joslyn et al. 2004) to select GO nodes as cluster heads for the terms in the input set based on the structure of the GO. Pre-processing Swiss-Prot and TrEMBL IDs were provided as input identifiers for the protein, so we needed to establish a set of names by which that protein could be referenced in the text. We made use of both the gene name and protein names that are in Swiss-Prot itself, when available, and a collection of synonyms constructed by Procter & Gamble Company. The fallback case was to us

CiteSeerX

The textual characteristics of traditional and Open Access scientific journals are similar

Author: A Knebel
A Swan
C Blaschke
D Biber
D Ferrucci
DP Corney
G Eysenbach
K Bretonnel Cohen
K Curran
K Verspoor
Karin Verspoor
KB Cohen
L Tanabe
Lawrence Hunter
M Krallinger
M Palmer
MP Marcus
P Rayson
PK Shah
S Kullback
T Dunning
W Hersh
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption. Results We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities. Conclusion We did not find structural or semantic differences between the Open Access and traditional journal collections.</p

Public Library of Science (PLOS)

Text Mining Improves Prediction of Protein Functional Sites

Author: A Koussounadis
A Sokolov
AG Murzin
AR Atilgan
AT Laurie
BJ Grant
CA Earhart
CB Ahlers
CJO Baker
CM Nunn
CT Porter
D Ferrucci
D Ming
D Ming
D Ming
D Ming
D Ming
D Oliver
D Zhou
DS Greer
F Horn
F Leitner
GL Card
HJ Nam
HM Berman
I Bahar
J Dundas
J Laurila
J Ory
JD Cohn
JG Caporaso
JG Caporaso
JK Hurley
JM Jez
Judith D. Cohn
JY Choe
K Hinsen
K Nagel
K Nagel
K Nagel
K Verspoor
K Verspoor
K Verspoor
K Verspoor
Karin M. Verspoor
KB Cohen
KE Ravikumar
KL Damm
Komandur E. Ravikumar
L Hu
L Xie
LH Weaver
LJ Jensen
LL Huang
M Ankerst
M Krallinger
M Krallinger
ME Wall
MF Sanner
Michael E. Wall
ML Benson
ML Benson
MM Tirion
N Chim
Neil R. Smalheiser
PE Bourne
R Gaizauskas
R Witte
RC Edgar
S Perot
TW Schwartz
WA Baumgartner Jr
Publication venue: Public Library of Science
Publication date: 29/02/2012
Field of study

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions

Helsingin yliopiston digitaalinen arkisto

The Dagstuhl Perspectives Workshop on Performance Modeling and Prediction

Author: Castells Pablo
Daly Elizabeth M.
Declerck Thierry
Ekstrand Michael D.
Ferro Nicola
Fuhr Norbert
Geyer Werner
Gonzalo Julio
Grefenstette Gregory
Konstan Joseph A.
Kuflik Tsvi
Lindén Krister
Magnini Bernardo
Nie Jian-Yun
Perego Raffaele
Shapira Bracha
Soboroff Ian
Tintarev Nava
Verspoor Karin
Willemsen Martijn C.
Zobel Justin
Publication venue
Publication date: 01/01/2018
Field of study

Non peer reviewe

Repository TU/e

Pure OAI Repository

Archivio istituzionale della ricerca - Università di Padova

Recognition of social health: A conceptual framework in the context of dementia research

Author: Brodaty Henry
Chattat Rabih
de Vugt Marjolein
Hubers Claudia
Ikram M. Arfan
Jeon Yun Hee
Lenart-Bugla Marta
Maddock Jane
Marseglia Anna
Melis Rene
Moniz-Cook Esme
Perry Marieke
Richards Marcus
Rymaszewska Joanna
Sachdev Perminder S.
Samtani Suraj
Szczesniak Dorota
van der Velpen Isabelle F.
Vernooij Meike W.
Vernooij-Dassen Myrra
Verspoor Eline
Welmer Anna Karin
Wiegelmann Henrik
Wolf-Ostermann Karin
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2022
Field of study

Objective: The recognition of dementia as a multifactorial disorder encourages the exploration of new pathways to understand its origins. Social health might play a role in cognitive decline and dementia, but conceptual clarity is lacking and this hinders investigation of associations and mechanisms. The objective is to develop a conceptual framework for social health to advance conceptual clarity in future studies. Process: We use the following steps: underpinning for concept advancement, concept advancement by the development of a conceptual model, and exploration of its potential feasibility. An iterative consensus-based process was used within the international multidisciplinary SHARED project. Conceptual framework: Underpinning of the concept drew from a synthesis of theoretical, conceptual and epidemiological work, and resulted in a definition of social health as wellbeing that relies on capacities both of the individual and the social environment. Consequently, domains in the conceptual framework are on both the individual (e.g., social participation) and the social environmental levels (e.g., social network). We hypothesize that social health acts as a driver for use of cognitive reserve which can then slow cognitive impairment or maintain cognitive functioning. The feasibility of the conceptual framework is demonstrated in its practical use in identifying and structuring of social health markers within the SHARED project. Discussion: The conceptual framework provides guidance for future research and facilitates identification of modifiable risk and protective factors, which may in turn shape new avenues for preventive interventions. We highlight the paradigm of social health in dementia as a priority for dementia research

Repository@Hull - Worktribe

Maastricht University Research Portal

EUR Research Repository

UCL Discovery

From Evaluating to Forecasting Performance: How to Turn Information Retrieval, Natural Language Processing and Recommender Systems into Predictive Sciences

Author: Castells Pablo
Daly Elizabeth M.
Declerck Thierry
Ekstrand Michael D.
Ferro Nicola
Fuhr Norbert
Geyer Werner
Gonzalo Julio
Grefenstette Gregory
Konstan Joseph A.
Kuflik Tsvi
Lindén Krister
Magnini Bernardo
Nie Jian-Yun
Perego Raffaele
Shapira Bracha
Soboroff Ian
Tintarev Nava
Verspoor Karin
Willemsen Martijn C.
Zobel Justin
Publication venue
Publication date: 01/01/2018
Field of study

Non peer reviewe

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

Archivio istituzionale della ricerca - Università di Padova

The structural and content aspects of abstracts versus bodies of full text journal articles are different

Author: Alias-i
B Settles
BM Szmrecsányi
C Blaschke
C Friedman
C Gasperin
C Gasperin
Christophe Roeder
D Jurafsky
D Klein
DP Corney
G Leroy
Helen L Johnson
I Goldin
J Lin
JG Caporaso
K Bretonnel Cohen
K Verspoor
Karin Verspoor
L Hirschman
L Tanabe
Lawrence E Hunter
M Krallinger
N Elhadad
PG Mutalik
PI Nakov
R Leaman
S Abney
S Agarwal
T McIntosh
W Chapman
W Chapman
W Hersh
WA Baumgartner Jr
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background An increase in work on the full text of journal articles and the growth of PubMedCentral have the opportunity to create a major paradigm shift in how biomedical text mining is done. However, until now there has been no comprehensive characterization of how the bodies of full text journal articles differ from the abstracts that until now have been the subject of most biomedical text mining research. Results We examined the structural and linguistic aspects of abstracts and bodies of full text articles, the performance of text mining tools on both, and the distribution of a variety of semantic classes of named entities between them. We found marked structural differences, with longer sentences in the article bodies and much heavier use of parenthesized material in the bodies than in the abstracts. We found content differences with respect to linguistic features. Three out of four of the linguistic features that we examined were statistically significantly differently distributed between the two genres. We also found content differences with respect to the distribution of semantic features. There were significantly different densities per thousand words for three out of four semantic classes, and clear differences in the extent to which they appeared in the two genres. With respect to the performance of text mining tools, we found that a mutation finder performed equally well in both genres, but that a wide variety of gene mention systems performed much worse on article bodies than they did on abstracts. POS tagging was also more accurate in abstracts than in article bodies. Conclusions Aspects of structure and content differ markedly between article abstracts and article bodies. A number of these differences may pose problems as the text mining field moves more into the area of processing full-text articles. However, these differences also present a number of opportunities for the extraction of data types, particularly that found in parenthesized text, that is present in article bodies but not in article abstracts.</p

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

Author: A Aronson
A Doms
A Jimeno
A Koike
A Sokolov
AT McCray
B Settles
Benjamin Garcia
C Brewster
C Jonquet
C Roeder
C Verspoor
Christophe Roeder
Christopher Funk
D Ferrucci
D Hancock
D Rebholz-Schuhmann
DA Natale
DL Wheeler
DS DeLuca
FM Couto
H Liu
H Yu
HM Muller
IBM
J Bard
JC Denny
JC Denny
JG Caporaso
K Bretonnel Cohen
K Degtyarenko
K Eilbeck
K Verspoor
K Verspoor
K Verspoor
K Verspoor
Karin Verspoor
KB Cohen
L Hunter
L Reeve
L Yao
Lawrence E Hunter
M Bada
M Bada
M Krallinger
M Tanenblatt
Michael Bada
MJ Schuemie
N Kang
N Shah
Ontology Consortium The Gene
P Khatri
PV Ogren
Q Zou
R Leaman
S Ray
S Van Landeghem
SA Stewart
T Rocktaschel
William Baumgartner
WW Chu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study